Lesson 2: Image SegmentationΒΆ

⏳ Note (Kernel Starting): This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.

  • In this classroom, the libraries have been already installed for you.
  • If you would like to run this code on your own machine, you need to install the following:
    !pip install ultralytics torch
    

Load the sample imageΒΆ

InΒ [115]:
from PIL import Image
raw_image = Image.open("dogs.jpg")
raw_image
Out[115]:
No description has been provided for this image

Note: the images referenced in this notebook have already been uploaded to the Jupyter directory, in this classroom, for your convenience. For further details, please refer to the Appendix section located at the end of the lessons.

  • Resize the image.
InΒ [116]:
from utils import resize_image
resized_image = resize_image(raw_image, input_size=1024)
resized_image
Out[116]:
No description has been provided for this image

Import and prepare the modelΒΆ

InΒ [117]:
import torch
InΒ [118]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device
Out[118]:
device(type='cpu')

Info about torch.

InΒ [119]:
from ultralytics import YOLO
model = YOLO('./FastSAM.pt')

Info about 'FastSAM'

Use the modelΒΆ

Note: utils is an additional file containing the methods that have been already developed for you to be used in this classroom. For further details, please refer to the Appendix section located at the end of the lessons.

InΒ [120]:
from utils import show_points_on_image
InΒ [121]:
# Define the coordinates for the point in the image
# [x_axis, y_axis]
input_points = [ [350, 450 ] ]
InΒ [122]:
input_labels = [1] # positive point
InΒ [123]:
# Function written in the utils file
show_points_on_image(resized_image, input_points)
No description has been provided for this image
InΒ [124]:
# Run the model
results = model(resized_image, device=device, retina_masks=True)
0: 704x1024 23 objects, 7193.8ms
Speed: 3.5ms preprocess, 7193.8ms inference, 195.1ms postprocess per image at shape (1, 3, 704, 1024)
  • Filter the mask based on the point defined before.
InΒ [125]:
from utils import format_results, point_prompt
InΒ [126]:
results = format_results(results[0], 0)
InΒ [127]:
# Generate the masks
masks, _ = point_prompt(results, input_points, input_labels)
InΒ [128]:
from utils import show_masks_on_image
InΒ [129]:
# Visualize the generated masks
show_masks_on_image(resized_image, [masks])
Out[129]:
No description has been provided for this image
  • Define 'semantic masks' - two points to be masked.
InΒ [130]:
# Specify two points in the same image
# [x_axis, y_axis]
input_points = [ [350, 450], [620, 450] ]
InΒ [131]:
# Specify both points as "positive prompt"
input_labels = [1 , 1] # both positive points
InΒ [132]:
# Visualize the points defined before
show_points_on_image(resized_image, input_points)
No description has been provided for this image
InΒ [133]:
# Run the model
results = model(resized_image, device=device, retina_masks=True)
0: 704x1024 23 objects, 6979.6ms
Speed: 2.6ms preprocess, 6979.6ms inference, 197.8ms postprocess per image at shape (1, 3, 704, 1024)
InΒ [134]:
results = format_results(results[0], 0)
InΒ [135]:
# Generate the masks
masks, _ = point_prompt(results, input_points, input_labels)
InΒ [136]:
# Visualize the generated masks
show_masks_on_image(resized_image, [masks])
Out[136]:
No description has been provided for this image

Note: Please note that the results obtained from running this notebook may vary slightly from those demonstrated by the instructor in the video.

  • Identify subsections of the image by adding a negative prompt.
InΒ [137]:
# Define the coordinates for the regions to be masked
# [x_axis, y_axis]
input_points = [ [350, 450], [400, 300]  ]
InΒ [138]:
input_labels = [1, 0] # positive prompt, negative prompt
InΒ [139]:
# Visualize the points defined above
show_points_on_image(resized_image, input_points, input_labels)
No description has been provided for this image

Note: From the image above, the red star indicates the negative prompt and the green star the positive prompt.

InΒ [140]:
# Run the model
results = model(resized_image, device=device, retina_masks=True)
0: 704x1024 23 objects, 6023.0ms
Speed: 2.7ms preprocess, 6023.0ms inference, 201.0ms postprocess per image at shape (1, 3, 704, 1024)
InΒ [141]:
results = format_results(results[0], 0)
InΒ [142]:
# Generate the masks
masks, _ = point_prompt(results, input_points, input_labels)
InΒ [143]:
# Visualize the generated masks
show_masks_on_image(resized_image, [masks])
Out[143]:
No description has been provided for this image

Note: From the image above, only the jacket, from the dog in the left, was segmented, so, it is following the command given by the positive prompt!

Prompting with bounding boxesΒΆ

InΒ [144]:
from utils import box_prompt
InΒ [145]:
# Set the bounding box coordinates
# [xmin, ymin, xmax, ymax]
input_boxes = [530, 180, 780, 600]
InΒ [146]:
from utils import show_boxes_on_image
InΒ [147]:
# Visualize the bounding box defined with the coordinates above
show_boxes_on_image(resized_image, [input_boxes])
No description has been provided for this image
  • Now, try to isolate the mask from the total output of the model.
InΒ [148]:
from utils import box_prompt
InΒ [149]:
results = model(resized_image, device=device, retina_masks=True)
0: 704x1024 23 objects, 6111.7ms
Speed: 2.6ms preprocess, 6111.7ms inference, 289.9ms postprocess per image at shape (1, 3, 704, 1024)
InΒ [150]:
# Generate the masks
masks = results[0].masks.data
InΒ [151]:
masks
Out[151]:
tensor([[[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        ...,

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]])
InΒ [152]:
# Convert to True/False boolean mask
masks = masks > 0
InΒ [153]:
masks
Out[153]:
tensor([[[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        ...,

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]],

        [[False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         ...,
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False],
         [False, False, False,  ..., False, False, False]]])
InΒ [154]:
masks, _ = box_prompt(masks, input_boxes)
InΒ [155]:
# Visualize the masks
show_masks_on_image(resized_image, [masks])
Out[155]:
No description has been provided for this image
InΒ [156]:
# Print the segmentation mask, but in its raw format
masks
Out[156]:
array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])
InΒ [157]:
# To visualize, import matplotlib
from matplotlib import pyplot as plt
InΒ [158]:
# Plot the binary mask as an image
plt.imshow(masks, cmap='gray')
Out[158]:
<matplotlib.image.AxesImage at 0x7f6014914710>
No description has been provided for this image

Try yourself!ΒΆ

Try the image segmentation explained before with your own images.

InΒ [159]:
# To start opening images, upload your own or use the sample images we've uploaded, for example: younes.png
# The image younes.png is already uploaded in this classroom
raw_image = Image.open('younes.png')
raw_image
Out[159]:
No description has been provided for this image
InΒ [162]:
# Resize image
from utils import resize_image
resized_image = resize_image(raw_image, input_size=1024)
resized_image
Out[162]:
No description has been provided for this image
InΒ [164]:
# Define the coordinates for the point: [x_axis, y_axis]
input_points = [ [700, 450 ] ]
InΒ [165]:
# Define the positive or negative prompt
input_labels = [1] 
InΒ [166]:
show_points_on_image(resized_image, input_points)
No description has been provided for this image
InΒ [167]:
# Run the model
results = model(resized_image, device=device, retina_masks=True)
results = format_results(results[0], 0)
# Generate the masks
masks, _ = point_prompt(results, input_points, input_labels)
# Visualize the generated masks
show_masks_on_image(resized_image, [masks])
0: 640x1024 99 objects, 7412.4ms
Speed: 3.3ms preprocess, 7412.4ms inference, 993.9ms postprocess per image at shape (1, 3, 640, 1024)
Out[167]:
No description has been provided for this image

Additional ResourcesΒΆ

  • For more on how to use Comet for experiment tracking, check out this Quickstart Guide and the Comet Docs.
  • This course was based off a set of two blog articles from Comet. Explore them here for more on how to use newer versions of Stable Diffusion in this pipeline, additional tricks to improve your inpainting results, and a breakdown of the pipeline architecture:
    • SAM + Stable Diffusion for Text-to-Image Inpainting
    • Image Inpainting for SDXL 1.0 Base Model + Refiner